For any β-smooth differentiable function ff (convex or not), if we run GD for TT steps, we can find a point 𝐱̂\hat{\mathbf{x}} such that: ||∇f(𝐱̂)||22≤2βT(f(𝐱(0))−f(𝐱*))||\nabla f(\hat{\mathbf{x}})||_2^2 \leq \frac{2\beta}{T}\left(f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)\right)
If T≥2βTT \geq \frac{2\beta}{T}, then ||∇f(𝐱̂)||22≤ϵ(f(𝐱(0))−f(𝐱*))||\nabla f(\hat{\mathbf{x}})||_2^2 \leq \epsilon \left(f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)\right).
see: Stationary point